White wine is a wine whose colour can be straw-yellow, yellow-green, or yellow-gold. It is produced by the alcoholic fermentation of the non-coloured pulp of grapes, which may have a skin of any colour. White wine has existed for at least 2500 years
Citation Request: This dataset is public available for research. The details are described in [Cortez et al., 2009].
P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553. ISSN: 0167-9236.
Available at: [@Elsevier] http://dx.doi.org/10.1016/j.dss.2009.05.016 [Pre-press (pdf)] http://www3.dsi.uminho.pt/pcortez/winequality09.pdf [bib] http://www3.dsi.uminho.pt/pcortez/dss09.bib
The inputs include objective tests (e.g. PH values) and the output is based on sensory data (median of at least 3 evaluations made by wine experts). Each expert graded the wine quality between 0 (very bad) and 10 (very excellent).
1.Title: Wine Quality
2.Sources:Created by: Paulo Cortez (Univ. Minho), Antonio Cerdeira, Fernando Almeida, Telmo Matos and Jose Reis (CVRVV) @ 2009
3.Number of Instances: 4898.
4.Number of Attributes: 13 attributes
Attribute information:
Input variables (based on physicochemical tests):
1 - fixed acidity (tartaric acid - g / dm^3)
2 - volatile acidity (acetic acid - g / dm^3)
3 - citric acid (g / dm^3)
4 - residual sugar (g / dm^3)
5 - chlorides (sodium chloride - g / dm^3
6 - free sulfur dioxide (mg / dm^3)
7 - total sulfur dioxide (mg / dm^3)
8 - density (g / cm^3)
9 - pH
10 - sulphates (potassium sulphate - g / dm3)
11 - alcohol (% by volume)
Output variable (based on sensory data): 12 - quality (score between 0 and 10)
Description of attributes:
1 - fixed acidity: most acids involved with wine or fixed or nonvolatile (do not evaporate readily)
2 - volatile acidity: the amount of acetic acid in wine, which at too high of levels can lead to an unpleasant, vinegar taste
3 - citric acid: found in small quantities, citric acid can add ‘freshness’ and flavor to wines
4 - residual sugar: the amount of sugar remaining after fermentation stops, it’s rare to find wines with less than 1 gram/liter and wines with greater than 45 grams/liter are considered sweet
5 - chlorides: the amount of salt in the wine
6 - free sulfur dioxide: the free form of SO2 exists in equilibrium between molecular SO2 (as a dissolved gas) and bisulfite ion; it prevents microbial growth and the oxidation of wine
7 - total sulfur dioxide: amount of free and bound forms of S02; in low concentrations, SO2 is mostly undetectable in wine, but at free SO2 concentrations over 50 ppm, SO2 becomes evident in the nose and taste of wine
8 - density: the density of water is close to that of water depending on the percent alcohol and sugar content
9 - pH: describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic); most wines are between 3-4 on the pH scale
10 - sulphates: a wine additive which can contribute to sulfur dioxide gas (S02) levels, wich acts as an antimicrobial and antioxidant
11 - alcohol: the percent alcohol content of the wine
Output variable (based on sensory data):
## X fixed.acidity volatile.acidity citric.acid residual.sugar chlorides
## 1 1 7.0 0.27 0.36 20.7 0.045
## 2 2 6.3 0.30 0.34 1.6 0.049
## 3 3 8.1 0.28 0.40 6.9 0.050
## 4 4 7.2 0.23 0.32 8.5 0.058
## 5 5 7.2 0.23 0.32 8.5 0.058
## 6 6 8.1 0.28 0.40 6.9 0.050
## free.sulfur.dioxide total.sulfur.dioxide density pH sulphates alcohol
## 1 45 170 1.0010 3.00 0.45 8.8
## 2 14 132 0.9940 3.30 0.49 9.5
## 3 30 97 0.9951 3.26 0.44 10.1
## 4 47 186 0.9956 3.19 0.40 9.9
## 5 47 186 0.9956 3.19 0.40 9.9
## 6 30 97 0.9951 3.26 0.44 10.1
## quality
## 1 6
## 2 6
## 3 6
## 4 6
## 5 6
## 6 6
## [1] "X" "fixed.acidity" "volatile.acidity"
## [4] "citric.acid" "residual.sugar" "chlorides"
## [7] "free.sulfur.dioxide" "total.sulfur.dioxide" "density"
## [10] "pH" "sulphates" "alcohol"
## [13] "quality"
## fixed.acidity volatile.acidity citric.acid residual.sugar
## Min. : 3.800 Min. :0.0800 Min. :0.0000 Min. : 0.600
## 1st Qu.: 6.300 1st Qu.:0.2100 1st Qu.:0.2700 1st Qu.: 1.700
## Median : 6.800 Median :0.2600 Median :0.3200 Median : 5.200
## Mean : 6.855 Mean :0.2782 Mean :0.3342 Mean : 6.391
## 3rd Qu.: 7.300 3rd Qu.:0.3200 3rd Qu.:0.3900 3rd Qu.: 9.900
## Max. :14.200 Max. :1.1000 Max. :1.6600 Max. :65.800
## chlorides free.sulfur.dioxide total.sulfur.dioxide
## Min. :0.00900 Min. : 2.00 Min. : 9.0
## 1st Qu.:0.03600 1st Qu.: 23.00 1st Qu.:108.0
## Median :0.04300 Median : 34.00 Median :134.0
## Mean :0.04577 Mean : 35.31 Mean :138.4
## 3rd Qu.:0.05000 3rd Qu.: 46.00 3rd Qu.:167.0
## Max. :0.34600 Max. :289.00 Max. :440.0
## density pH sulphates alcohol
## Min. :0.9871 Min. :2.720 Min. :0.2200 Min. : 8.00
## 1st Qu.:0.9917 1st Qu.:3.090 1st Qu.:0.4100 1st Qu.: 9.50
## Median :0.9937 Median :3.180 Median :0.4700 Median :10.40
## Mean :0.9940 Mean :3.188 Mean :0.4898 Mean :10.51
## 3rd Qu.:0.9961 3rd Qu.:3.280 3rd Qu.:0.5500 3rd Qu.:11.40
## Max. :1.0390 Max. :3.820 Max. :1.0800 Max. :14.20
## quality
## Min. :3.000
## 1st Qu.:5.000
## Median :6.000
## Mean :5.878
## 3rd Qu.:6.000
## Max. :9.000
## 'data.frame': 4898 obs. of 12 variables:
## $ fixed.acidity : num 7 6.3 8.1 7.2 7.2 8.1 6.2 7 6.3 8.1 ...
## $ volatile.acidity : num 0.27 0.3 0.28 0.23 0.23 0.28 0.32 0.27 0.3 0.22 ...
## $ citric.acid : num 0.36 0.34 0.4 0.32 0.32 0.4 0.16 0.36 0.34 0.43 ...
## $ residual.sugar : num 20.7 1.6 6.9 8.5 8.5 6.9 7 20.7 1.6 1.5 ...
## $ chlorides : num 0.045 0.049 0.05 0.058 0.058 0.05 0.045 0.045 0.049 0.044 ...
## $ free.sulfur.dioxide : num 45 14 30 47 47 30 30 45 14 28 ...
## $ total.sulfur.dioxide: num 170 132 97 186 186 97 136 170 132 129 ...
## $ density : num 1.001 0.994 0.995 0.996 0.996 ...
## $ pH : num 3 3.3 3.26 3.19 3.19 3.26 3.18 3 3.3 3.22 ...
## $ sulphates : num 0.45 0.49 0.44 0.4 0.4 0.44 0.47 0.45 0.49 0.45 ...
## $ alcohol : num 8.8 9.5 10.1 9.9 9.9 10.1 9.6 8.8 9.5 11 ...
## $ quality : int 6 6 6 6 6 6 6 6 6 6 ...
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.800 6.300 6.800 6.855 7.300 14.200
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0800 0.2100 0.2600 0.2782 0.3200 1.1000
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2700 0.3200 0.3342 0.3900 1.6600
After adjusting the binwidths and breaks I end up getting these plots. Fixed acidity,Voaltile Acidity,Citric Acid are Normally Distributed.I didn’t understand the unusual peak occured in the distribution of citric acid at 0.49. Contribution of Fixed acidity is more in white wine than volatile and citric acid Which is resonable because high amount of volatile acidity makes wine taste like vinegar.We can see from our summary data that the mean value of fixed acidity is 6.855 g/dm^3 and median is 6.800 g/dm^3, similarly mean value of volatile acidity and citric acid is 0.2782g/dm^3 and 0.3342g/dm^3 respectively.Median values of volatile acidity and citric acid is 0.2600g/dm^3 and 0.3200g/dm^3 respectively.Another remarkable thing which I observed is mean and median values of all above three columns are near to each other this suggest outliers have not affected mean of our dataset.It would be intersting to check if their contribution changes with quality.
Let’s Move ahead to the distribution of alcohol content in wine
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.00 9.50 10.40 10.51 11.40 14.20
Distribution of alcohol content is trimodal(multimodal) in nature. Percentage of alcohol content in wine ranges from 8.5 to 14.0 Mean alcohol content is 10.51 and median being 10.40.
Let us look now at the distribution of residual sugar
Basically, when winemaking happens, yeast eats sugar and makes ethanol (alcohol) as a by-product.I wonder how the relation between alcohol and sugar would be.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.600 1.700 5.200 6.391 9.900 65.800
We can see we have few outliers in Residual Sugar but they haven’t drag mean far away form median.Also distribution of residual sugar is having a long tail data we should scale x axis to get better visualization of distribution.
Distribution of data is bimodal. As mentioned in the introduction part it is rare to find residual sugar content less than 1 g/dm^3 and which is true in the above distribution. bulk of the data is ranging from 1.5 g/dm^3 to 17 g/dm^3.If we look at the summary statistics mean is 6.391g/dm^3 and median is 5.200g/dm^3.Our third quartile is 9.9 it implies 75% of the data is below 3rd quartile So definitely max value of 65.800 is an outlier.
Now moving ahead to density
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9871 0.9917 0.9937 0.9940 0.9961 1.0390
Distribution of density is normally distributed.We can see outliers at 1.010 and 1.038. density ranges from 0.9871 g/cm^3 to 1.0031 g/cm^3 and it is closely related to water depending on the percent alcohol and sugar content. we will look at the correlation between alcohol,sugar content and density in further analysis.
Moving our focus to sulfur dioxide
Total sulfur dioxide is combination of free sulfur dioxide and bound sulfur dioxide
the alcoholic fermentation will produce sulfur dioxide.We will look in the relation between alcohol and sulfur dioxide in later section.
But for now I am creating a new feature variable called bound.sulfur.dioxide and we will look into their distribution
Distribution of sulfur dioxide is normally distributed with over 10 to 250ppm. some outliers are observed beyond 400ppm.
We should zoom in more to free sulfur dioxide and bound sulfur dioxide but the distribution is fairly normal in both.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.0 108.0 134.0 138.4 167.0 440.0
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 23.00 34.00 35.31 46.00 289.00
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.0 78.0 100.0 103.1 125.0 331.0
Distribution of all the three is fairly normally distributed.alcohol fermantation produces Total sulfur dioxide in the range between 10 to 250 ppm(parts per million)
Bound sulfur dioxide contributes more to total sulfur dioxide it ranges between 10 to 200 ppm whereas free sulfur dioxide ranges between 1 to 80 ppm.
Let us look at the distribution of chlorides
Chlorides tells us the amount of salt in the wine and Chloride concentration in the wine is influenced by terroir but we don’t have terroir data along with us.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00900 0.03600 0.04300 0.04577 0.05000 0.34600
Chlorides in white wine ranges from 0.00 to 0.1 g/dm^3.We can see lot of outliers in the tail region,hence our distibution is narrowed down.
but I wonder if chlorides contribution changes with quality.
Just by a google search I understand Winemakers use pH as a way to measure ripeness in relation to acidity.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.720 3.090 3.180 3.188 3.280 3.820
Distibution of pH is ranging between 2.7 to 3.8. pH levels describes how acidic or basic a wine is on a scale from 0 (very acidic) to 14 (very basic).
It would be intersting to check what level of acidity receives highest quality rating.So I would be using cut function to create acidity levels.
Let’s look at quality ratings
Quality rating of 3 being lowest and 9 is the highest.datapoints for Quality ratings of 5,6 and 7 are more compared to remaining quality ratings.So I am making new feature variable quality bucket in which I will try to distribute ratings equally.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.000 5.000 6.000 5.878 6.000 9.000
##
## 3 4 5 6 7 8 9
## 20 163 1457 2198 880 175 5
White wine dataset contains 4898 rows with 13 variables.Columns are fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphate,alcohol,quality. Most of the feature variables are normally distributed viz.
1.Fixed acidity
2.volatile acidity
3.citric acid
4.density
5.sulfur dioxide
6.pH
7.chlorides
Alcohol has trimodal distribution whereas Residual sugar is having bimodal distribtuion.
pH level of wine in our dataset is 2.7 to 3.8,Median pH level is 3.1. Quality ratings are between 3(lowest) to 9(highest).Most of our observations have quality rating of 6
Let’s plot correlogram to get idea of the correlation between feature variables
I can see very strong correlation between residual sugar and density also between alcohol and density.first and foremost I will explore strong relationships and then the other relationships which I have mentioned in univariate section.
Residual Sugar and density are strongly correlated with each other.Density increases with increase in residual sugar.I wonder how residual sugar and density changes with quality of wine rating.
As we have seen in the introductory part density depends on residual sugar content and alcohol. This correlation is extremely strong and implies with increase in residual sugar density of wine increases.
Strong Negative relation is observed between alcohol and density.Density decreases with increase in alcohol content.It would be intersting to check what happens with alcohol content with increasing quality.
I didn’t expect relation between density and sulfur dioxide but this is moderately a strong relationship and density increases with increase in total sulfur dioxide.
We have few data points in our dataset for quality 3,4,8,9 but then too we can observe density lies between 0.99g/cm^3 to 1.00 g/cm^3
Keep in mind,I am trying to find how density varies with quality.I didn’t get impressive results from scatterplot.Let’s move to boxplot.
I can observe density decreases with quality but then too I will zoom in for more details.
## wine$quality_bucket: Low
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9872 0.9932 0.9951 0.9952 0.9971 1.0024
## --------------------------------------------------------
## wine$quality_bucket: Medium
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9876 0.9917 0.9937 0.9940 0.9959 1.0390
## --------------------------------------------------------
## wine$quality_bucket: High
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.9871 0.9905 0.9917 0.9924 0.9936 1.0006
Yes,we can clearly observe density decreases with increase in quality.Let’s move ahead to observe chlorides correlation with density.
Relation between chlorides and density is moderately strong.Does chloride changes with quality? I will look into it in further analysis.
Relation is moderately strong .Fixed acidity of white wine is between 6 g/dm^3 to 9 g/dm^3.
Mean Alcohol content is increasing with quality.I would also like to check the effect with boxplot.
As expected alcohol content increases with quality.
Residual Sugar decreases with quality.so is it something like with increasing quality white wines are more alcoholic and less sweet.
To be more sure I will check it with boxplot.
Yes we can clearly see the pattern residual sugar decreases with quality.So I conclude with increasing quality wines become more alcoholic and less sweet.
Moving ahead to check how fixed acidty varies with quality
We can see fixed acidity is not influenced by quality.Wine quality of Low,Medium,High has fixed acidity ranging between 5 g/dm^3 to 9 g/dm^3 .I will also look at their box plot.
Let’s zoom in more
## wine$quality_bucket: Low
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 4.200 6.400 6.800 6.962 7.500 11.800
## --------------------------------------------------------
## wine$quality_bucket: Medium
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.800 6.300 6.800 6.838 7.300 14.200
## --------------------------------------------------------
## wine$quality_bucket: High
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3.900 6.200 6.700 6.725 7.200 9.200
Median values are very close to each other I can’t see much difference in them.Fixed acidity doesn’t change with quality.
If I look at the median values of volatile acidity for lowest quality 3,4 median values are higher than for quality 8,9.For more information I will plot it in multivariate sectoion
Observed pH levels are from 2.8 to 3.6 irrespective of quality. I will plot box plot for more information.
Median value moderately gets higher as per each quality levels but the increment is extremely small.
I will try to plot how acidity levels are distributed among different qualities
Relation between chlorides and alcohol is moderately strong but negative.It shows with increasing content of chlorides in wine alcohol content decreases.
Fermentation of alcohol produces sulfur dioxide so I thought with increasing percentage of alcohol sulfur dioxide should increase but the relation I have got here is divergent.
For alcohol percent between 8.5 to 10.5 sulfur dioxide ranges from 100ppm to 250ppm but as the alcohol content increases sulfur dioxide is observed between 80ppm to 120ppm.
I wonder how ratio of free sulfur dioxide to bound sulfur dioxide changes with alcohol.
Ok I understand for low as well as high alcohol content free sulfur dioxide is 0.4times higher than bound sulfur dioxide.
With increasing quality ratio of free sulfur to bound sulfur dioxide increases.
Amount of Chlorides decreases with higher quality. I would like check the chlorides distribution with acidity levels.
## wine$acidity.levels: High
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0140 0.0350 0.0420 0.0479 0.0510 0.3010
## --------------------------------------------------------
## wine$acidity.levels: Moderately High
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01400 0.03700 0.04400 0.04777 0.05175 0.27100
## --------------------------------------------------------
## wine$acidity.levels: Medium
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.00900 0.03600 0.04300 0.04447 0.04900 0.20100
## --------------------------------------------------------
## wine$acidity.levels: Low
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.01300 0.03400 0.04200 0.04257 0.04900 0.34600
Chlorides distribution doesn’t particularly change with acidity levels.I am going to focus more on the other feature variables distribution with acidity levels.
## wine$acidity.levels: High
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 18.0 105.0 132.0 136.4 167.8 366.5
## --------------------------------------------------------
## wine$acidity.levels: Moderately High
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 28.0 113.0 142.0 143.9 174.0 344.0
## --------------------------------------------------------
## wine$acidity.levels: Medium
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 24.0 107.0 132.0 137.3 166.0 307.5
## --------------------------------------------------------
## wine$acidity.levels: Low
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 9.0 108.0 132.0 135.7 162.0 440.0
I don’t see any variations with sulfur dioxide
## wine$acidity.levels: High
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2300 0.3900 0.4600 0.4752 0.5400 1.0000
## --------------------------------------------------------
## wine$acidity.levels: Moderately High
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2500 0.4100 0.4700 0.4793 0.5200 1.0600
## --------------------------------------------------------
## wine$acidity.levels: Medium
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2200 0.4100 0.4700 0.4889 0.5400 1.0800
## --------------------------------------------------------
## wine$acidity.levels: Low
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.2800 0.4300 0.5000 0.5181 0.5900 0.9700
## wine$acidity.levels: High
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.20 10.10 10.34 11.30 13.70
## --------------------------------------------------------
## wine$acidity.levels: Moderately High
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.40 9.40 10.10 10.38 11.20 14.20
## --------------------------------------------------------
## wine$acidity.levels: Medium
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.50 9.50 10.50 10.63 11.43 14.05
## --------------------------------------------------------
## wine$acidity.levels: Low
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.00 9.90 10.50 10.73 11.50 14.00
## wine$acidity.levels: High
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.600 1.900 7.000 7.546 12.100 26.050
## --------------------------------------------------------
## wine$acidity.levels: Moderately High
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.700 2.025 6.400 6.895 10.200 31.600
## --------------------------------------------------------
## wine$acidity.levels: Medium
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.700 1.700 4.900 5.971 8.500 20.800
## --------------------------------------------------------
## wine$acidity.levels: Low
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.700 1.500 2.800 4.988 7.200 65.800
Residual sugar increases with acidity levels.The higher the acidity in a wine, the more residual sugar the wine can have.
I have observed many interesting relationships among feature variables.
For High quality wine, alcohol content is high while residual sugar content is low.That means higher quality wines are more alcoholic and less sweet.
Density of wine and chlorides content decreases with higher quality.
Residual sugar increases with acidity levels.The higher the acidity in a wine, the more residual sugar the wine can have.
Alcohol and residual sugar are in inverse proportion if we increase one quantity other decreases.
Fermentation of alcohol produces sulfur dioxide With increasing alcohol content total sulfur dioxide decreases.On average free sulfur dioxide is 0.4times higher than bound sulfur dioxide for low as well as high alcohol content.
Here,I am going to explore plots from bivariate section.First I’ll start exploring different feature variables variation with density and quality.
we have concluded in bivariate section density increases with residual sugar.After adding jitter, transparency, and changing the plot limits we can see the variation in residual sugar and density.It also tells for high quality wine density decreases even for same level of residual sugar.
Next,I’ll look at density vs alcohol variation with quality.
After adding jitter, transparency, and changing the plot limits we can see the strong negative relation between alcohol and density.For high quality wine alcohol content is high whereas density is low.
After changing the plot limits we can observe Chlorides and density has moderate positive correlation.We can observe for higher quality wine, chlorides content and density of wine is low.
Next,I’ll look at scatterplot of total sulfur dioxide and density.
After adding jitter, transparency, and changing the plot limits we can observe, for low quality wine ,quantity of sulfur dioxide as well as density is high.However for higher quality wine density and sulfur dioxide is low.
Moving ahead I’ll plot scatterplot for density and fixed acidity
After adding jitter, transparency, and changing the plot limits we can observe,for low quality wine fixed acidity and density are higher.
Next, I’ll look how alcohol varies with different feature variables
After changing the plot limits,above plot clearly implies residual sugar and alcohol are in inverse proportion to each other.with increasing quality alcohol content increases and residual sugar decreases.
After adding jitter, transparency, and changing the plot limits we can see chlorides content is high for low quality wine while alcohol content is low compared to high quality wine.
We know that from bivariate plot section alcohol and total sulfur dioxide are negatively correlated.After adding jitter, transparency, and changing the plot limits,we can see for low quality wine sulfur dioxide is high and alcohol content is low.
Fixed acidity is between 5 g/dm^3 to 9 g/dm^3.I don’t see any variation for fixed acidity levels by quality but we can observe again alcohol content is high for high quality wine.
Quality 6 seem to overplot alot.I will remove it to see if we get more useful information.
For Quality 7 I can clearly observe that volatile acidity is increasing with alcohol content with remaining qualties median values are either same for increasing alcohol content or doesn’t show a clear pattern.
Fixed acidity is between 5 g/dm^3 to 10 g/dm^3.here also I don’t see any variation for acidity levels by quality but we can observe again residual sugar content is low for high quality wine.
Let’s move ahead to see variation of alcohol content and residual sugar by acidity levels
We have seen in the bivariate analysis the higher the acidity in a wine, the more residual sugar the wine can have.We can observe this clearly here with acidity levels sugar increases and alcohol content decreases.
I particualrly analysed relations in bivariate sections but by differentiating them with quality and acidity levels I ended up getting asthetic plots.
Residual sugar,alcohol and density had strong relationship but by varying them with quality buckets.I could come up with more clear observation.For high quality wine alcohol content is high,density and residual sugar content is low.it sums up that alcohol and residual sugar are in inverse proportion.
For high quality wine density,chlorides,total sulfur dioxide and fixed acidity are low.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 8.00 9.50 10.40 10.51 11.40 14.20
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.600 1.700 5.200 6.391 9.900 65.800
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.720 3.090 3.180 3.188 3.280 3.820
To improve the plot,I have used quality feature as factor variable and I am happy with the final output.It clearly shows the difference for low quality wine, pH level is low except for quality no.3 and pH level increases with higher quality wine.
Three major feature affect quality which are alcohol,residual sugar and acidity levels.So I thought of plotting other two w.r.t quality feature as factor variable.The results are pretty intuitive.For quality rating below 6 alcohol content is low and it increases tremendously for high quality wine and vice versa for residual sugar.
I haven’t shown any different relation output here but I have tried to improve the visualization of scatter plot between alcohol and density.They both are in strong negative relation I have tried fitting a line and also plotted histogram of both the feature variables using ggMarginal.
This plot shows density and residual sugar are strongly correlated.Fitting a line makes it more clear to understand the visualization.
I started the exploration with usual look at the dataset by using head,summary functions and then plotted different histograms and frequency polygons to understand the distribution of my feature variables.I had low knowledge of wines so I started reading articles to understand each feature variable meaning.This helped me to make new feature variable bound sulfur dioxide and then I got an idea of finding it’s ratio with free sulfur dioxide.After doing both these steps I moved forward in bivariate section to understand the correlation among the feature variables.By reading articles and correlogram, I had intuition that density,alcohol,residual sugar and acidity levels are major important fetures for quality.Scatterplots helped me visualize the findings and made observations very clear.Fermentation of alcohol produces total sulfur dioxide but with increasing alcohol content it decreases which was quite surprising to me.If we had data of price of wine,its terrior and year when it is manufactured, It would have helped gain more insights because Chloride concentration is influenced by terroir and grape type.I have read, wines get better with age also price would have helped me find distinction between low and high quality wines.